Skip to content

Netlist hash diagnostic#4232

Merged
maliberty merged 3 commits into
The-OpenROAD-Project:masterfrom
oharboe:netlist-hash-diagnostic
May 16, 2026
Merged

Netlist hash diagnostic#4232
maliberty merged 3 commits into
The-OpenROAD-Project:masterfrom
oharboe:netlist-hash-diagnostic

Conversation

@oharboe
Copy link
Copy Markdown
Collaborator

@oharboe oharboe commented May 15, 2026

Reduce wild goose chases with helpful diagnostics

oharboe and others added 2 commits May 15, 2026 12:48
flow/README.md "Triaging a failing _test" → "Yosys-environment false
positive" already calls out that bazel-built yosys and make-built
yosys can produce different 1_2_yosys.v for the same RTL, drifting
QoR past rules-base.json thresholds even though OpenROAD itself is
bit-deterministic.  Today the only way to spot the drift is to
re-run `@bazel-orfs//:make-yosys-netlist`; if a designer has a
freshly-failing _test in front of them, they have no way to see
"the yosys netlist changed" vs "a real regression".

Persist a fingerprint in rules-base.json instead so the next _test
just prints it:

  - genMetrics.py: emit `synth__canonical_netlist__hash` (SHA-1 of
    1_1_yosys_canonicalize.rtlil, the canonical RTLIL pre-ABC) and
    `synth__netlist__hash` (SHA-1 of 1_2_yosys.v, post-ABC).  Having
    both lets the user see whether divergence is in the front-end
    flow or downstream of ABC.

  - genRuleFile.py: new `literal` mode passes the metric value
    through verbatim (no padding / rounding / float coercion).  The
    two hash fields use it with `level: warning` + `compare: ==`,
    so `_update` writes them into rules-base.json and `_test`
    treats a hash mismatch as a diagnostic, not a failure.

  - checkMetadata.py: a warning-level mismatch previously printed
    "[WARN] field pass test: a == b" — confusing when a != b yet
    "pass" implied a match.  Say "differs" instead so the hash
    mismatch reads naturally without changing the no-error contract
    (still doesn't increment ERRORS).

rules-base.json files are unchanged in this commit; the next
`_update` per design adds the two hash fields automatically, and
existing _test runs are unaffected until that happens.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
genElapsedTime.py used to pick exactly one of .odb / .rtlil / .v as
"the result file" for a log and hash that.  Synth produces both
1_2_yosys.v and 1_2_yosys.sdc; OpenROAD stages produce both an
.odb and a .sdc.  Folding all of them into one column means a
divergent .sdc (an SDC-generator change) and a divergent .odb (a
real flow change) look identical in the elapsed-time triage table.

Replace `get_hash` with `get_hashes` that returns every result-file
extension that exists, and emit one row per (stage, extension)
with the elapsed time and peak memory on the stage's first row
only.  Column order is `.v / .rtlil / .odb / .sdc` so the primary
data artifact comes first and the constraint file last.

Added a regression test (`test_emits_one_row_per_result_extension`)
to lock the two-row-per-stage behaviour in; existing tests cover
the no-result-file fallback (single row with ext="" and hash=N/A)
since the table now has an Ext column.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe oharboe requested a review from maliberty May 15, 2026 11:00
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the flow's reporting and metadata tracking by introducing multi-file hashing and a "literal" rule mode. Key changes include updating genElapsedTime.py to report hashes for multiple result extensions (including .sdc), adding netlist hash extraction to genMetrics.py, and implementing a "literal" mode in genRuleFile.py to support non-numeric metrics like SHA-1 fingerprints. Additionally, checkMetadata.py was updated to provide clearer warning messages when metadata values differ, and a new test case was added to verify the multi-row output in elapsed time reports. I have no feedback to provide as there were no review comments.

Comment thread flow/util/genElapsedTime.py Outdated
Comment thread flow/util/genMetrics.py Outdated
- genElapsedTime.py: hash .def / .spef / .gds in addition to .v /
  .rtlil / .odb / .sdc so the elapsed-time table covers all primary
  flow artifacts, not just the synth + odb subset.
- genMetrics.py: drop the flow/README.md "Triaging a failing _test"
  pointer next to the netlist-hash metrics; the triage section will
  land in a later PR.

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe oharboe requested a review from maliberty May 16, 2026 10:17
@maliberty maliberty merged commit 2f6e9c9 into The-OpenROAD-Project:master May 16, 2026
9 checks passed
@maliberty
Copy link
Copy Markdown
Member

Note for the future: we are working towards a goal to remove rules files from git and have the QoR dashboard implement the checking.

@oharboe
Copy link
Copy Markdown
Collaborator Author

oharboe commented May 16, 2026

Note for the future: we are working towards a goal to remove rules files from git and have the QoR dashboard implement the checking.

Nice!

How can bazel-orfs get to them?

@oharboe
Copy link
Copy Markdown
Collaborator Author

oharboe commented May 16, 2026

@povik @maliberty I have more fine grained artifact hash checking in another PR for synthesis, but until yosys-slang is idempotent, there is no rush.

@maliberty
Copy link
Copy Markdown
Member

There will be an API endpoint.

@oharboe
Copy link
Copy Markdown
Collaborator Author

oharboe commented May 16, 2026

There will be an API endpoint.

Anonymous access?

sounds like a repository rule job

@oharboe
Copy link
Copy Markdown
Collaborator Author

oharboe commented May 16, 2026

@maliberty Have you considered detached git branches as storage? We have had great success with that as a database for rules*.json stuff.

Super easy, fast, no new infrastructure or API endpoints .

I can ask Claude to rearticulate the current infrastructure in that format as a PR w docs to explain

@oharboe
Copy link
Copy Markdown
Collaborator Author

oharboe commented May 16, 2026

@vvbandeira @maliberty FYI

Orphan branch as an append-only database

git checkout --orphan produces a branch with no shared history with master. Used as an append-only data store, with rules-base.json as the worked example.

Prior art

The orphan-branch-as-data-store pattern is the same mechanism that powers gh-pages, where projects publish built documentation alongside source code on a branch that shares no history with master. The Python benchmarking tool airspeed-velocity/asv ships an asv gh-pages command that appends benchmark results to an orphan branch over a project's lifetime, and deployed instances at NumPy, SciPy, and AstroPy show years of accumulated history. The widely-used GitHub Action benchmark-action/github-action-benchmark does the same thing for continuous benchmarking: results land as JSON on the orphan branch (gh-pages by default) and a chart page renders the history.

Layout

The branch is named, for example, rules. It is a true orphan, so git merge-base master rules returns nothing. The branch contains one commit per (design, ORFS SHA) measurement, with the rule file stored at <pdk>/<design>/<orfs-sha-short>.json. The commit message takes the form rules: <pdk>/<design> @ <orfs-sha-short> [<openroad-sha-short> yosys-<v>], and the file contents themselves carry the full version triple — OpenROAD commit, yosys, ABC, PDK commit, ORFS SHA — lifted from metadata.json, so the record is self-describing once extracted. Branch protection forbids force-push and history rewrites.

Writing

The recorder runs in CI on master after a flow run. It uses git worktree add to check out the orphan branch into a sibling directory, leaving the main checkout untouched. It reads metadata.json, writes the rule file, and de-duplicates against existing entries on the branch. It then commits as CI and pushes HEAD:rules.

Reading

checkMetadata.py resolves a rule envelope by looking it up on rules. The resolution order is: exact match on the current ORFS SHA, then merge-base against origin/master, then a walk back through master ancestry until a recorded SHA is found, and finally a checked-in fallback for runs without the branch fetched. Anywhere outside the flow, the data is reachable with git show rules:<pdk>/<design>/<sha>.json from any clone that has fetched the branch.

Costs

The orphan branch trails master, since it only contains what master CI has actually published; PR baselines therefore resolve at the merge-base SHA, not at the branch's HEAD. A default git clone does not fetch orphan branches, so checkouts need an explicit git fetch origin rules:rules. make update_rules becomes a network operation, which means contributors without push rights need a local-only fallback. Schema evolution behaves the way it always does — adding a metric is safe if readers tolerate absent fields in older entries, but renaming is a one-way door. A code change and its corresponding rule change live on different histories, so the correlation between them is reconstructed by SHA in tooling rather than by git log. The branch grows append-only and git log rules slows over time. Contributors hold a two-branch mental model, and a pre-commit hook is needed to catch attempts to commit rule changes to master. Finally, the branch is only as fresh as the last green master CI run.

bazel-orfs

bazel-orfs consumes the same data through a Bazel repository rule (or module extension) that fetches the rules branch at a pinned commit and exposes the per-design JSON files as a filegroup. Designs that run a metric check depend on @rules//<pdk>/<design>:<sha>.json as a Bazel input, which keeps the lookup inside the hermetic sandbox without giving build actions network or implicit git access. Pinning the orphan-branch SHA in MODULE.bazel (or a lockfile) makes the build reproducible: a given bazel-orfs revision always resolves to the same rule envelope, independent of new measurements landing on rules. Bumping the pin is a one-line PR, reviewable like any other dependency bump. Writing is unchanged — the recorder runs outside bazel build, in the CI driver that wraps it, using the same worktree-and-push sequence as the Make-based flow. The resolution-order logic (exact ORFS SHA, then merge-base, then ancestry, then fallback) moves into the repository rule, since the Bazel action itself only sees the resolved file.

Migration

For each existing rules-base.json, the master commit that last touched it is found with git log -1 --format=%H -- <path>, and the file is replayed onto rules with that commit's author date. No measurements are invented.

@oharboe
Copy link
Copy Markdown
Collaborator Author

oharboe commented May 16, 2026

@povik @maliberty yosys idempotency is a problem worth giving up on...

Orphan branch as a cache for synthesized .rtlil

yosys-slang, the SystemVerilog frontend plugin for yosys, is non-idempotent: re-running it on the same SV sources produces a different .rtlil because of pointer-iteration order inside the elaborator. Upstream treats this as expected. Downstream, every flow run absorbs the drift, every QoR check becomes noisy, and version skew between developers becomes a debugging trap. The mechanism that worked for rules-base.json — an orphan branch as an append-only data store — applies here too. The orphan branch holds canonical .rtlil files captured at the post-frontend point, regenerated only on a deliberate yosys-slang (or yosys) upgrade. A sidecar YAML enumerates every input that went into producing the file, so the cache key is explicit and the file is rebuilt when any input changes.

Prior art

Content-addressed caches for slow or non-deterministic build steps are universal. ccache and sccache cache compiler output keyed on preprocessor input. Bazel's remote action cache does the same thing at action-graph granularity. The Nix store at /nix/store/<hash>-<name> is a filesystem-backed instance of the same idea. This proposal uses the same content-addressing pattern, transported over git as an orphan branch in the same way that gh-pages, airspeed-velocity/asv, and benchmark-action/github-action-benchmark use orphan branches to persist data alongside source.

Layout

The branch is named, for example, rtlil-cache. It is a true orphan, sharing no history with master. For each cached entry it holds two files at <pdk>/<design>/<input-hash>.rtlil and <pdk>/<design>/<input-hash>.yaml, where <input-hash> is a content hash over the sidecar YAML. The sidecar enumerates every input that affects the frontend's output: yosys-slang commit SHA, yosys version string and commit SHA, ABC commit (if captured post-ABC), the SHA256 of each HDL source file consumed, the SHA256 of any include files and package files reachable from the elaboration root, every option passed to read_slang and the surrounding yosys script, and the exact command sequence that was executed up to the capture point. The file's own hash is the cache key, so the YAML is self-describing and the .rtlil is reproducible from it. Branch protection forbids force-push and history rewrites.

Writing

New entries are produced in two contexts. The first is a deliberate yosys-slang/yosys upgrade sweep: after bumping the submodule on master, a CI job elaborates every (design, option-set) in the matrix once, generating one .rtlil per combination, and pushes all of them to rtlil-cache in a single coordinated commit series. The non-determinism is resolved by first-write-wins — whatever the frontend produced on that sweep run becomes the canonical output for that input hash. The second is on-demand: when a flow run hits a cache miss (new design, modified options, modified RTL), CI runs the frontend to fill the entry and pushes it. In both cases the recorder uses git worktree add to check out the orphan branch into a sibling directory, writes the pair of files, de-duplicates against existing entries by input hash, and pushes.

Reading

At flow time the orchestrator computes the input hash from the current YAML, looks up <pdk>/<design>/<hash>.rtlil on rtlil-cache, and on a hit reads the cached file directly instead of running the frontend. Downstream yosys passes and ABC continue from that canonical RTLIL as they would today. On a miss it falls through to running the frontend, with the recorder publishing the result so subsequent runs hit the cache. Outside the flow the cache is reachable with git show rtlil-cache:<pdk>/<design>/<hash>.rtlil from any clone that has fetched the branch.

Costs

The cache is only sound if the YAML enumerates every input that affects the frontend's output. A missed input means a stale .rtlil is reused after that input has changed, producing silent QoR drift. Adding inputs over time is safe — the hash changes, the cache misses, the entry is regenerated — but discovering a missing input retroactively means treating older entries as suspect. .rtlil files are large; the orphan branch grows fast, and git-lfs or shallow fetches may be needed. A yosys-slang or yosys upgrade is no longer a one-line submodule bump but a deliberate event with a CI sweep attached; until the sweep finishes, the cache is stale for the new version and either the flow falls back to running the frontend (re-introducing the drift the cache exists to suppress) or refuses to build. The capture point is a choice with consequences: cutting post-frontend freezes yosys-slang output but lets later yosys passes run live; cutting post-synth or post-ABC freezes more of the pipeline but bakes in ABC's own non-determinism. Contributors hold a two-branch mental model, and a default git clone does not fetch the orphan branch. The first-write-wins resolution means whichever machine ran the sweep determines the canonical output for everyone, which is the point but is worth stating out loud.

bazel-orfs

bazel-orfs consumes the cache through a repository rule that fetches rtlil-cache at a pinned commit and exposes the entries as a content-addressed filegroup. The frontend Bazel action depends on the cached .rtlil as an input rather than running yosys-slang, which removes the non-deterministic step from the hermetic action graph for the common case. The action is still defined so cache misses fall through to it, but on cache hits the action becomes a file-copy and Bazel's own remote cache handles it from there. Pinning the orphan-branch SHA in MODULE.bazel makes the build reproducible: a given bazel-orfs revision always resolves to the same .rtlil, independent of new entries landing on rtlil-cache. A yosys-slang or yosys upgrade is a coordinated change — bump the dependency, run the sweep, push to rtlil-cache, advance the pin — reviewable as a single PR.

Migration

For each (design, option-set) currently exercised on master, run the frontend once at the current pinned yosys-slang and yosys versions, capture the resulting .rtlil and the YAML enumerating the inputs that produced it, and push the pair to rtlil-cache. This yields a baseline cache covering today's matrix at today's tool versions. Designs added later or option-sets exercised later fill the cache on demand through the cache-miss path.

@povik
Copy link
Copy Markdown
Contributor

povik commented May 16, 2026

yosys-slang, the SystemVerilog frontend plugin for yosys, is non-idempotent: re-running it on the same SV sources produces a different .rtlil because of pointer-iteration order inside the elaborator. Upstream treats this as expected.

This is not true. Neither me (for yosys-slang) or the yosys team treats non-determinism as expected.

@oharboe
Copy link
Copy Markdown
Collaborator Author

oharboe commented May 16, 2026

yosys-slang, the SystemVerilog frontend plugin for yosys, is non-idempotent: re-running it on the same SV sources produces a different .rtlil because of pointer-iteration order inside the elaborator. Upstream treats this as expected.

This is not true. Neither me (for yosys-slang) or the yosys team treats non-determinism as expected.

Claude says things that are not true.

@maliberty
Copy link
Copy Markdown
Member

Pass. We can discuss at more length whenever we talk next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants